Skip to content

feat(seccomp): ExtraHandler — user-supplied syscall handlers#20

Open
dzerik wants to merge 5 commits intomultikernel:mainfrom
dzerik:feature/extra-handlers
Open

feat(seccomp): ExtraHandler — user-supplied syscall handlers#20
dzerik wants to merge 5 commits intomultikernel:mainfrom
dzerik:feature/extra-handlers

Conversation

@dzerik
Copy link
Copy Markdown

@dzerik dzerik commented Apr 24, 2026

Summary

Adds a public extension point for downstream crates that need to register their own seccomp-notification handlers alongside sandlock's builtin chroot/cow/procfs/network/port_remap logic.

Motivation. Downstream crates that want to intercept additional syscalls in the same supervisor task as sandlock's builtins have no clean way to do it today — one SECCOMP_FILTER_FLAG_NEW_LISTENER per process means a single listener, so a second supervisor cannot run alongside. The only alternative is forking sandlock or patching notif::supervisor wholesale.

API.

  • New type dispatch::ExtraHandler { syscall_nr, handler }.
  • New entry Sandbox::run_with_extra_handlers(policy, cmd, extras).
  • Existing Sandbox::run() delegates to it with empty extras — zero behaviour change for current callers.

Ordering contract (documented + tested).

  • Builtins register first (chroot path normalization, COW, procfs, …).
  • Extras appended last, in the Vec order.
  • Chain stops at first non-Continue — user handlers cannot subvert builtin confinement.

Docs.

  • docs/extension-handlers.md: design rationale, security boundary, panics policy, non-goals, downstream sketch.
  • crates/sandlock-core/examples/openat_audit.rs: runnable example.

Minor bump 0.6 → 0.7 suggested.

Test plan

  • 4 new unit tests in dispatch::extra_handler_tests (ctor, insertion order, append-after-builtin, empty-extras nop) — passing
  • All 215 unit tests pass
  • Example openat_audit.rs runs against a python3 -c guest

@dzerik dzerik force-pushed the feature/extra-handlers branch from 5f2b730 to 71c5724 Compare April 24, 2026 14:00
@congwang-mk
Copy link
Copy Markdown
Contributor

congwang-mk commented Apr 26, 2026

Thanks for the PR!

Two main issues:

  1. The mechanism (split builtins-first, extras-after; security boundary preserved) is the right shape.
    What's missing is the plumbing through to the BPF filter so that extras can actually intercept the
    syscalls they're written for. Without that, the API ships a footgun: registers handlers that silently
    never fire, with no error to tell the user why.

  2. Tests that actually exercise dispatch. The current tests verify Vec::push, not the security contract.
    At minimum, an integration test that registers an extra returning NotifAction::Errno(libc::EACCES) and
    confirms the child sees EACCES.

@dzerik dzerik force-pushed the feature/extra-handlers branch 3 times, most recently from 1d0783d to 431c207 Compare April 27, 2026 07:51
@dzerik
Copy link
Copy Markdown
Author

dzerik commented Apr 27, 2026

You were right on both counts — the missing BPF plumbing was the
load-bearing gap, and the original tests were verifying Vec::push
rather than the security contract. Reworked in 431c207, details below.

1. BPF plumbing

  • Sandbox::do_spawn collects the syscall numbers from the
    Vec<ExtraHandler> before fork and threads them into the child via a
    new context::ChildSpawnArgs.extra_syscalls field.
  • The child merges them into notif_syscalls(policy) before
    bpf::assemble_filter, with sort_unstable + dedup so a syscall
    registered by both a builtin and an extra produces a single JEQ.

While wiring this up I noticed an adjacent footgun: the cBPF program emits
notif JEQs before deny JEQs, so an extra registered on a syscall in
DEFAULT_DENY_SYSCALLS (e.g. mount) would convert a kernel-deny into a
user-supervised path and let a Continue from the handler bypass deny via
SECCOMP_USER_NOTIF_FLAG_CONTINUE. Sandbox::run_with_extra_handlers
now validates extras against the policy's deny list at registration time
and returns SandboxError::Child naming the offending syscall —
surfacing the "no error to tell the user why" gap you flagged.
Documented in §3.0.1.

2. Tests that actually exercise dispatch

The unit-level Vec::push checks remain (cheap regression cover); on top
there are now seven integration tests that drive the full kernel path:

  • extra_handler_intercepts_syscall_outside_builtin_set — the case you
    asked for: SYS_uname (not in any builtin's notif list under default
    policy), handler returns Errno(EACCES), the guest observes EACCES.
  • extra_handler_continue_lets_syscall_proceedContinue becomes
    SECCOMP_USER_NOTIF_FLAG_CONTINUE and the kernel resumes the syscall.
  • extra_handler_runs_after_builtin_returns_continueSYS_openat
    with the always-on /proc-virt builtin returning Continue for
    non-/proc paths, extra observes those openats.
  • builtin_non_continue_blocks_extra — symmetric half: openat on
    /proc/1/cmdline is rejected by the procfs builtin and is never
    observed by the extra, while a peer openat on /etc/hostname is.
    Handler reads the path via process_vm_readv so the assertion is
    structural rather than counter-based.
  • chain_of_extras_runs_in_insertion_order — two extras on SYS_uname,
    first returns Continue, second Errno(EACCES); counters increment
    in lock step (c1 == c2) and the guest sees the EACCES.
  • extra_handler_on_default_deny_syscall_is_rejectedSYS_mount
    registration is refused up-front with a descriptive error.
  • empty_extras_preserves_default_behaviour — backwards compatibility.

Cosmetic

confine_child had grown to seven positional parameters; packed into
ChildSpawnArgs so the call site stays readable.

Deliberately deferred

  • HandlerFn is still Box<dyn Fn ...> + Send + Sync rather than a
    trait Handler { async fn ... }; happy to convert if you'd prefer
    the trait shape, but it's an API-shape change that probably wants
    its own discussion.
  • HandlerPriority::Before (audit handlers seeing pre-builtin
    arguments) remains in ## 4 Non-goals in the doc — orthogonal,
    add when there's a concrete use case.
  • Panic isolation around user handlers stays the responsibility of
    the downstream crate (documented in §3.4 with a catch_unwind
    example) — wrapping every dispatch in catch_unwind by default
    would mask bugs.

Diff stat: 8 files, ~1046 / -9. All 215 unit tests pass; integration
suite passes modulo the same pre-existing 54-test failure set observed
on origin/main (kernel/capability env, unrelated to this change).
Latest head: 431c207.

@congwang-mk
Copy link
Copy Markdown
Contributor

No test for policy.deny_syscalls ?

Comment thread docs/extension-handlers.md Outdated
non-[`NotifAction::Continue`](../crates/sandlock-core/src/seccomp/notif.rs)
result wins.

This patch exposes a **public extension point**:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it a formal document, it reads like a patch description.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote as a design doc. Removed PR-narrative phrasing ("this patch", "available since 0.7 (branch ...)", "two concrete use cases motivate this API"); reorganised: API → Semantics (Ordering, Return values, Continue-site safety) → Security boundary (Extras can/cannot, BPF coverage, Deny-list bypass guard) → Panics → Use cases → Limitations → Backwards compatibility → Downstream usage. Present tense throughout; ordering invariants now reference both the unit and integration tests inline.


// builtin is index 0, extra is index 1
let chain = table.chains.get(&libc::SYS_openat).unwrap();
assert_eq!(chain.handlers.len(), 2, "two handlers expected");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about ordering?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right — those tests only counted handlers (chain.handlers.len()); same Vec::push-not-the-contract pattern you flagged the first time, just on a new test. Replaced both with three that drive DispatchTable::dispatch directly against a minimal SupervisorCtx (built from the per-state new()s — no kernel, no fds, handlers ignore ctx):

  • dispatch_walks_chain_in_registration_order — three handlers all returning Continue, asserts the recorded tag sequence is [1, 2, 3].
  • dispatch_runs_builtin_before_extra — builtin via register, extra via ExtraHandler; asserts [B, E] and Continue propagates.
  • dispatch_stops_at_first_non_continue — first handler returns Errno(EACCES); asserts the second handler never runs and the errno surfaces.

End-to-end ordering stays exercised by the integration tests (extra_handler_runs_after_builtin_returns_continue, builtin_non_continue_blocks_extra, chain_of_extras_runs_in_insertion_order).

@dzerik dzerik force-pushed the feature/extra-handlers branch from 431c207 to c602c47 Compare May 1, 2026 16:04
dzerik added a commit to dzerik/sandlock that referenced this pull request May 1, 2026
Symmetric to the existing default-deny coverage, exercises the
user-specified branch of `deny_syscall_numbers` (when
`Policy::deny_syscalls` is set, it overrides DEFAULT_DENY).  Without
this branch covered, a future regression in user-list resolution would
silently let an extra register on a caller-denied syscall and
`Continue` would translate to `SECCOMP_USER_NOTIF_FLAG_CONTINUE`,
bypassing the kernel deny.

Both tests use `SYS_mremap`: it is recognised by `syscall_name_to_nr`
but not present in `DEFAULT_DENY_SYSCALLS`, so it lands on the deny
list only via the user-supplied branch — isolating that path from the
default-deny path covered by
`extra_handler_on_default_deny_syscall_is_rejected`.

- Unit (`seccomp::dispatch::extra_handler_tests::
  validate_extras_rejects_user_specified_deny`): drives
  `validate_extras_against_policy` directly, no kernel dependency, so
  the contract is enforced even on hosts where seccomp integration
  tests are skipped.
- Integration (`test_extra_handlers::
  extra_handler_on_user_specified_deny_is_rejected`): drives the full
  `Sandbox::run_with_extra_handlers` rejection path; asserts the
  offending syscall number is surfaced in the error.

Addresses review feedback on PR multikernel#20.

Signed-off-by: dzerik <dzerik@gmail.com>
@dzerik
Copy link
Copy Markdown
Author

dzerik commented May 1, 2026

Added.

  • Unit (seccomp::dispatch::extra_handler_tests::validate_extras_rejects_user_specified_deny) — drives validate_extras_against_policy directly with Policy::builder().deny_syscalls(vec!["mremap".into()]); no kernel dependency, so the contract holds even on hosts where seccomp integration tests are skipped.
  • Integration (test_extra_handlers::extra_handler_on_user_specified_deny_is_rejected) — drives the full Sandbox::run_with_extra_handlers rejection path; asserts the offending syscall number is surfaced in SandboxError::Child.

Both use SYS_mremap because it is in syscall_name_to_nr but not in DEFAULT_DENY_SYSCALLS — putting it into deny_syscalls is the only way it lands on the deny list, so the test isolates the user-supplied branch of deny_syscall_numbers from the default-deny branch already covered by extra_handler_on_default_deny_syscall_is_rejected.

Rebased on origin/main 44b78fa to clear merge conflicts. While doing the rebase I also tightened a couple of things:

  • run_with_extra_handlers error message now names both DEFAULT_DENY_SYSCALLS and policy.deny_syscalls. Previously it said only "default-deny list", which under-described the new branch the test exercises.
  • validate_extras_against_policy doc-comment now covers the allowlist-mode branch (policy.allow_syscalls = Some(_) ⇒ empty deny list ⇒ Ok(()); sound because the BPF deny block is also empty in that mode) and notes why the function is pub(crate) rather than pub (only safe consumption path is run_with_extra_handlers, and that calls it pre-fork).
  • docs/extension-handlers.md §3.2 adds a Continue-site safety note about not holding SupervisorCtx locks across .await (extends the Sandbox escape by racing seccomp notifications? #27 contract to user handlers). §3.4 fixes the catch_unwind snippet — the previous one didn't compile because std::panic::catch_unwind is synchronous; the async equivalent is futures::FutureExt::catch_unwind. §6 drops the hardcoded test count.

All 227 sandlock-core unit tests pass; all 8 integration tests in test_extra_handlers.rs pass (existing 7 + new 1). Head: c602c47.

dzerik added 4 commits May 1, 2026 19:39
Adds a public extension point for downstream crates that need to
register their own seccomp-notification handlers alongside sandlock's
builtin chroot/cow/procfs/network/port_remap logic.

Motivation: downstream crates that want to intercept additional
syscalls in the same supervisor task as sandlock's builtins have no
clean way to do it today — one SECCOMP_FILTER_FLAG_NEW_LISTENER per
process means a single listener, so a second supervisor cannot run
alongside.  The only alternative is forking sandlock or patching
notif::supervisor wholesale.

API:
- New type dispatch::ExtraHandler { syscall_nr, handler }.
- New entry Sandbox::run_with_extra_handlers(policy, cmd, extras).
- Existing Sandbox::run() delegates to it with empty extras — zero
  behaviour change for current callers.

Ordering contract (documented + tested):
- Builtins register first (chroot path normalization, COW, procfs, …).
- Extras appended last, in the Vec order.
- Chain stops at first non-Continue — user handlers cannot subvert
  builtin confinement.

BPF coverage (this is what plumbs extras to the kernel):
- Sandbox::do_spawn collects the syscall numbers from extra_handlers
  and threads them into the child via the new ChildSpawnArgs.extra_syscalls
  field on context::confine_child.
- The child merges them into notif_syscalls(policy) before
  bpf::assemble_filter, with sort + dedup so a syscall registered both
  by a builtin and an extra produces a single JEQ.
- Without this step the kernel would never raise USER_NOTIF for a
  syscall that has no builtin handler — the dispatch table would
  receive nothing and the user handler would silently never fire.

Default-deny bypass guard:
- The cBPF program emits notif JEQs before deny JEQs, so a syscall
  present in both lists hits SECCOMP_RET_USER_NOTIF first.  An extra
  on a DEFAULT_DENY syscall would therefore convert a kernel-deny into
  a user-supervised path, and a Continue from the handler would
  silently bypass deny.
- Sandbox::run_with_extra_handlers now validates extras against the
  policy's deny list at registration time via
  dispatch::validate_extras_against_policy and returns
  SandboxError::Child naming the offending syscall — no silent footgun.

Internals:
- build_dispatch_table now takes Vec<ExtraHandler> and drains it into
  register() calls after builtins.
- notif::supervisor signature extended to accept extras and pass them
  through.  sandbox.rs moves self.extra_handlers via std::mem::take
  on spawn (HandlerFn is Box<dyn Fn> — not Clone).
- confine_child's seven positional parameters packed into
  context::ChildSpawnArgs to keep the call site readable.

Docs:
- docs/extension-handlers.md: design rationale, security boundary,
  panics policy, non-goals, downstream sketch.  Adds §3.0 (BPF-filter
  merge semantics) and §3.0.1 (default-deny bypass guard); corrects
  the NotifAction variant table (ReturnValue, Kill { sig, pgid }).
- crates/sandlock-core/examples/openat_audit.rs: runnable example.

Tests:
- 4 unit tests on dispatch::extra_handler_tests (ctor, insertion
  order, append-after-builtin, empty-extras nop).
- 7 integration tests under tests/integration/test_extra_handlers.rs
  exercising the full kernel path:
  * extra on SYS_uname (not intercepted by any builtin) returning
    Errno(EACCES) reaches the guest;
  * Continue lets the kernel resume the syscall;
  * empty extras vector preserves baseline behaviour;
  * cross-handler ordering: extra on SYS_openat fires after the
    /proc-virtualization builtin returns Continue;
  * registration on SYS_mount (DEFAULT_DENY) is rejected up-front
    with a descriptive error;
  * builtin non-Continue blocks extra: openat on /proc/1/cmdline is
    rejected by the procfs builtin and is never observed by the
    extra (path inspected via process_vm_readv), while a peer
    openat on /etc/hostname is observed — proves the chain stops at
    first non-Continue end-to-end through the kernel;
  * chain of two extras on the same syscall: first returns Continue,
    second returns Errno(EACCES) — both counters increment in lock
    step (insertion order preserved) and the guest sees the EACCES.
- All 215 unit tests pass; the 178-test integration suite passes
  modulo the pre-existing 54-test failure set observed on origin/main
  (kernel/capability environment, unrelated to this change).

Minor bump 0.6 → 0.7 suggested.

Signed-off-by: dzerik <dzerik@gmail.com>
Symmetric to the existing default-deny coverage, exercises the
user-specified branch of `deny_syscall_numbers` (when
`Policy::deny_syscalls` is set, it overrides DEFAULT_DENY).  Without
this branch covered, a future regression in user-list resolution would
silently let an extra register on a caller-denied syscall and
`Continue` would translate to `SECCOMP_USER_NOTIF_FLAG_CONTINUE`,
bypassing the kernel deny.

Both tests use `SYS_mremap`: it is recognised by `syscall_name_to_nr`
but not present in `DEFAULT_DENY_SYSCALLS`, so it lands on the deny
list only via the user-supplied branch — isolating that path from the
default-deny path covered by
`extra_handler_on_default_deny_syscall_is_rejected`.

- Unit (`seccomp::dispatch::extra_handler_tests::
  validate_extras_rejects_user_specified_deny`): drives
  `validate_extras_against_policy` directly, no kernel dependency, so
  the contract is enforced even on hosts where seccomp integration
  tests are skipped.
- Integration (`test_extra_handlers::
  extra_handler_on_user_specified_deny_is_rejected`): drives the full
  `Sandbox::run_with_extra_handlers` rejection path; asserts the
  offending syscall number is surfaced in the error.

Addresses review feedback on PR multikernel#20.

Signed-off-by: dzerik <dzerik@gmail.com>
The integration tests and extra_handler_ctor_preserves_fields use full
parameter names (_notif, _ctx, _fd).  The dispatch ordering unit tests
inherited the abbreviated form (_n, _c, _f) from earlier iterations of
this test mod; align them with the rest for in-mod consistency.

No behaviour change — all 228 sandlock-core unit tests and 8
extra_handlers integration tests still pass.

Signed-off-by: dzerik <dzerik@gmail.com>
Downstream crates that use Sandbox::run_with_extra_handlers need to
inspect and modify guest memory from inside their handlers — read the
path argument of openat, copy a write() buffer into a chunked S3 upload,
synthesise a fake getdents64 directory listing.

The TOCTOU-safe wrappers in seccomp::notif were already correct
(id_valid before + after process_vm_readv / process_vm_writev), they
were just pub(crate). Promote them to pub. The internal *_vm variants
remain private.

NotifError and SeccompNotif are already public, so this completes the
guest-memory-access API surface for ExtraHandlers.

Signed-off-by: dzerik <dzerik@gmail.com>
@dzerik dzerik force-pushed the feature/extra-handlers branch from c602c47 to 9f7d1b2 Compare May 1, 2026 16:50
@dzerik
Copy link
Copy Markdown
Author

dzerik commented May 1, 2026

Apologies — missed the two inline review comments on my previous pass and only addressed the issue-level "No test for policy.deny_syscalls" question. Both inline points addressed now (replied in-thread on each); summary of what landed:

  • dispatch.rs:880 (ordering) — replaced the two chain.handlers.len()-only unit tests with three that drive DispatchTable::dispatch directly against a minimal SupervisorCtx, asserting the recorded handler-invocation sequence end-to-end (registration order, builtin-before-extra, short-circuit on first non-Continue).
  • docs/extension-handlers.md:15 (formal doc) — rewrote in design-doc tone, removed PR-narrative phrasing, restructured into API / Semantics / Security boundary / Panics / Use cases / Limitations / Backwards compatibility / Downstream usage.
  • style nit — aligned handler-closure parameter names in the new unit tests (_n, _c, _f_notif, _ctx, _fd) with the convention used everywhere else in the test mod and the integration suite.

The policy.deny_syscalls tests (unit + integration) from my previous comment are unchanged.

Latest head: 9f7d1b2.

The arm64 Ubuntu runner (and aarch64 hosts in general) do not have
`/lib64` — the dynamic linker lives under `/lib/aarch64-linux-gnu/`.
A strict `fs_read("/lib64")` makes Landlock refuse to add the rule
and the child exits before completing confinement, surfacing in the
parent as `pipe closed before 4 bytes read`.

Switch to `fs_read_if_exists("/lib64")` to mirror the convention
already used by `test_dry_run`, `test_fork`, `test_netlink_virt`,
and `test_landlock`.  Brings the six previously-failing tests on
the `ubuntu-24.04-arm` CI job back to green; x86_64 unchanged.

Signed-off-by: dzerik <dzerik@gmail.com>
@dzerik
Copy link
Copy Markdown
Author

dzerik commented May 1, 2026

Two follow-ups after the previous push:

1. arm64 CI fix. The ubuntu-24.04-arm Rust-tests job went red on 9f7d1b2 with six test_extra_handlers::* failing as read notif fd from child: pipe closed before 4 bytes read. Root cause: my base_policy() helper used a strict fs_read("/lib64"), but aarch64 Ubuntu hosts have no /lib64 (the dynamic linker lives under /lib/aarch64-linux-gnu/), so Landlock refused the rule and the child exited before completing confinement. Switched to fs_read_if_exists("/lib64") to mirror the convention already used in test_dry_run, test_fork, test_netlink_virt, and test_landlock. Fixed in 8f20ab8.

2. pub fn read_child_mem / write_child_mem (272ae0d). Slipped into the push along with the arm64 fix — happy to split into a separate PR if you prefer that scope discipline. It does feel logically tied to ExtraHandler in practice: without those helpers public, downstream user handlers can intercept a syscall but cannot read the path argument (openat/unlinkat/...) or the buffer pointer (write/writev/...) the kernel passed by reference, which makes most realistic ExtraHandler use cases impossible. The internal *_vm variants stay private; only the TOCTOU-safe wrappers (id_valid before + after process_vm_readv/process_vm_writev) are promoted from pub(crate) to pub. 12-line diff, no behaviour change.

Latest head: 8f20ab8.

@congwang-mk
Copy link
Copy Markdown
Contributor

The FFI (sandlock-ffi) and Python SDK (python/src/sandlock/) are not updated together with Rust?

@congwang-mk
Copy link
Copy Markdown
Contributor

Really nice security work here — the deny-list bypass guard and the BPF notif-list merge show genuine kernel-level care, and the test suite (especially builtin_non_continue_blocks_extra proving the procfs builtin shadows extras) is exemplary. Going to push back on one thing though, while there's still room before downstream crates pin to it: the user-facing handler types.

The openat_audit.rs example is the simplest possible handler — count + log — and it still needs five layers of ceremony:

let audit: HandlerFn = Box::new(move |notif, _ctx, _fd| {
    let counter = Arc::clone(&counter_clone);
    Box::pin(async move {
        let n = counter.fetch_add(1, Ordering::SeqCst) + 1;
        eprintln!("[audit #{n}] pid={} openat", notif.pid);
        NotifAction::Continue
    })
});

Outer Box::new, type ascription : HandlerFn, inner Box::pin(async move {}), double Arc::clone for the nested move, and no place to put state except in captured Arcs. That's the cost of HandlerFn = Box<dyn Fn(...) -> Pin<Box<dyn Future + Send>> + Send + Sync> as the public type. Real downstream handlers (VFS engines, S3 streamers, audit pipelines) have config + connections + buffers; they want state on a struct, not in nested clone ladders.

A trait-based shape gets there in roughly the same LOC and keeps closures working via a blanket impl:

#[async_trait]
pub trait Handler: Send + Sync + 'static {
    async fn handle(&self, cx: &HandlerCtx<'_>) -> NotifAction;
}

pub struct HandlerCtx<'a> {
    pub notif: SeccompNotif,
    pub sup: &'a SupervisorCtx,
    pub notif_fd: RawFd,
}

// Closures still work — no Box::new, no Box::pin at the call site.
#[async_trait]
impl<F, Fut> Handler for F
where
    F: Fn(&HandlerCtx<'_>) -> Fut + Send + Sync + 'static,
    Fut: Future<Output = NotifAction> + Send + 'static,
{ ... }

Then the example becomes:

struct OpenAudit { count: AtomicUsize }

#[async_trait]
impl Handler for OpenAudit {
    async fn handle(&self, cx: &HandlerCtx<'_>) -> NotifAction {
        let n = self.count.fetch_add(1, Ordering::SeqCst) + 1;
        eprintln!("[audit #{n}] pid={} openat", cx.notif.pid);
        NotifAction::Continue
    }
}

Zero Arc::clone, zero Box::*, state on self.

While we're touching this surface, two related cleanups worth folding in:

  1. syscall_nr: i64Syscall newtype. Today ExtraHandler::new(-5, h) compiles and casts to a huge u32 in the BPF list. A checked Syscall::checked(nr) (rejects negatives and arch-unknown numbers) closes the same "silent never fires" footgun the deny-list guard already closes for the deny case.
  2. Move registration onto Sandbox instead of a parallel runner. Sandbox::new(p)?.add_handler(s, h).run(&cmd) collapses the run × run_interactive × extras matrix that currently needs a parallel run_*_with_extra_handlers for every variant, and naturally allows a Priority::Before axis later for the compliance-audit use case the docs already advertise (extras-after-builtins can't observe builtin-denied syscalls, which is a real gap for auditors).

None of this changes the security work — validate_extras_against_policy moves verbatim, the dispatch chain semantics are identical, the BPF merge is unchanged. It's purely the user-facing types.

Timing argument: sandlock is at 0.7 and ExtraHandler is brand new — the breaking-change window is free right now. If a downstream crate ships against HandlerFn/pub syscall_nr: i64 first, every future tightening is a migration. Cheaper to land the security infrastructure now and follow up with the trait shape before a downstream crate pins, or fold them together in this PR if you have appetite for it.

Happy to sketch the migration as a separate PR on top if useful — the security boundary you've built is the load-bearing part and I don't want this comment to block that landing.

@dzerik
Copy link
Copy Markdown
Author

dzerik commented May 2, 2026

Thanks for both — security approval makes the load-bearing part landable, and the API-shape feedback is exactly the kind of pre-pin tightening that's cheaper now than after a downstream crate ships against HandlerFn / pub syscall_nr: i64.

Strong preference for the same scope split you suggested: land the security part as-is, peel the rest into a follow-up chain. Concrete proposal, in landing order:

1. #20 (this PR)ExtraHandler API + BPF plumbing + deny-list guard + tests + docs. Mergeable now, no further changes from my side unless you flag something.

2. Follow-up A — user-facing reshape: trait Handler with a blanket impl for closures (so existing |notif, _ctx, _fd| async {...} keeps compiling), Syscall::checked(nr) newtype rejecting negative / arch-unknown numbers, Sandbox::new(p)?.add_handler(s, h).run(&cmd) collapsing the run × run_interactive × extras matrix. The Priority::Before axis lands as a follow-up enum variant when the audit use case in the docs gets a concrete consumer. Happy to take your offer to sketch this one yourself if you'd rather drive the API shape; otherwise I open it on top of merged #20 with the same test rigor (deny-list + ordering + Continue-site safety carried over to the new types).

3. Follow-up B — FFI parity (sandlock-ffi + python/src/sandlock/). Closure-async has no natural C ABI shape, so this lands on top of the trait reshape: a Handler trait can expose an FFI-friendly extern "C" fn(notif_ptr, user_data) -> NotifAction slot for predefined handlers (audit, structured logging) without dragging arbitrary closures across the ABI boundary. The Python wrapper then mirrors whatever predefined handlers the FFI surface offers — closer to the existing SDK shape than trying to ship arbitrary Python callbacks across the seccomp notif loop.

Reasoning for the ordering: writing FFI against the current Box<dyn Fn> shape and then re-migrating once the trait reshape lands would be twice the work for the same result. The trait shape is also what makes extern "C" predefined handlers ergonomic on the FFI side.

Two scope-questions to make sure I'm not missing intent:

  • feat(seccomp): ExtraHandler — user-supplied syscall handlers #20 mergeable as-is? Or any specific change you want folded in before merge — in particular, do you want the orthogonal read_child_mem / write_child_mem expose (272ae0d) split into its own PR? Happy to revert that commit if it tightens the scope here.
  • Drive Follow-up A yourself or delegate to me? Either works; just want to avoid both of us starting on the same diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants